EN FR
EN FR
Bilateral Contracts and Grants with Industry
Bibliography
Bilateral Contracts and Grants with Industry
Bibliography


Section: Software and Platforms

Alpage's linguistic workbench, including Sx Pipe

Participants : Benoît Sagot [correspondant] , Rosa Stern, Marion Baranes, Damien Nouvel, Virginie Mouilleron, Pierre Boullier, Éric Villemonte de La Clergerie.

See also the web page http://lingwb.gforge.inria.fr/ .

Alpage's linguistic workbench is a set of packages for corpus processing and parsing. Among these packages, the Sx Pipe package is of a particular importance.

Sx Pipe [80] is a modular and customizable chain aimed to apply to raw corpora a cascade of surface processing steps. It is used

  • as a preliminary step before Alpage's parsers (e.g., FRMG);

  • for surface processing (named entities recognition, text normalization, unknown word extraction and processing...).

Developed for French and for other languages, Sx Pipe includes, among others, various named entities recognition modules in raw text, a sentence segmenter and tokenizer, a spelling corrector and compound words recognizer, and an original context-free patterns recognizer, used by several specialized grammars (numbers, impersonal constructions, quotations...). In 2012, Sx Pipe has received a renewed attention in four directions:

  • Support of new languages, and most notably German (although this is still at a very preliminary stage of development;

  • Analysis of unknown words, in particular in the context of the ANR project EDyLex and of the collaboration with viavoo; this involves in particular (i) new tools for the automatic pre-classification of unknown words (acronyms, loan words...) (ii) new morphological analysis tools, most notably automatic tools for constructional morphology (both derivational and compositional), following the results of dedicated corpus-based studies (see  6.2 for new results);

  • Development of new local grammars for detecting new types of entities and improvement of existing ones, in the context of the PACTE project (see  6.7 for new results).